Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

JIT: Transform X % C == 0 to X & (C-1) == 0 #25744

Closed
wants to merge 9 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 17, 2019

Optimize X % C == 0 to X & (C-1) == 0. This expression is used in DateTime.IsLeapYear.

C#:

bool Test(int year) => year % 4 == 0;

Old:

; Method T:Test(int):bool
G_M29623_IG01:

G_M29623_IG02:
       mov      eax, edx
       sar      eax, 31
       and      eax, 3
       add      eax, edx
       and      eax, -4
       sub      edx, eax
       sete     al
       movzx    rax, al

G_M29623_IG03:
       ret      
; Total bytes of code: 22

New:

; Method T:Test(int):bool

G_M29623_IG01:

G_M29623_IG02:
       test     dl, 3
       sete     al
       movzx    rax, al

G_M29623_IG03:
       ret      
; Total bytes of code: 10

Will run jit diffs later.

@EgorBo EgorBo changed the title JIT: Transform X % C == 0 to % & (C-1) == 0 JIT: Transform X % C == 0 to X & (C-1) == 0 Jul 17, 2019
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
src/jit/morph.cpp Outdated Show resolved Hide resolved
@mikedn
Copy link

mikedn commented Jul 17, 2019

Overall seems fine but I'm not sure if this is the best way to do it. Another way is to simply change the existing GT_MOD to GT_UMOD, that will allow lowering to transform it into the X & (C - 1).

@EgorBo
Copy link
Member Author

EgorBo commented Jul 17, 2019

@mikedn hm... indeed! Changed MOD to UMOD.

TODO:

        bool Test1(int year) => year % 4 != 0; // GT_GT instead of GT_NE

        int Test2(int year)
        {
            if (year > 0)
                return year % 4; // year is guaranteed to be positive
            return 0;
        }

@mikedn
Copy link

mikedn commented Jul 17, 2019

The part about figuring out that year is positive due to the if check is currently problematic in the JIT.

@mikedn
Copy link

mikedn commented Jul 17, 2019

bool Test1(int year) => year % 4 != 0; // GT_GT instead of GT_NE

I don't quite understand where does GT_GT comes from.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 17, 2019

bool Test1(int year) => year % 4 != 0; // GT_GT instead of GT_NE

I don't quite understand where does GT_GT comes from.

I guess from Roslyn:

bool Test1(int year) => year % 4 != 0;
IL_0000: ldarg.1
IL_0001: ldc.i4.4
IL_0002: rem
IL_0003: ldc.i4.0
IL_0004: cgt.un
IL_0006: ret
[000006] ------------              *  STMT      void  (IL 0x000...  ???)
[000005] ------------              \--*  RETURN    int   
[000004] N--------U--                 \--*  GT        int   
[000002] ------------                    +--*  MOD       int   
[000000] ------------                    |  +--*  LCL_VAR   int    V01 arg1         
[000001] ------------                    |  \--*  CNS_INT   int    4
[000003] ------------                    \--*  CNS_INT   int    0

@mikedn
Copy link

mikedn commented Jul 17, 2019

Ah, right. I got confused because the code uses int rather than uint but then the GT_GT is actually unsigned. AFAIR I added code to transform that GT_NE though I don't remember where I put that right now. Must be somewhere in morph already.

@mikedn
Copy link

mikedn commented Jul 17, 2019

#9454 does GT_GT -> GT_NE. Not sure it's not picking up your case. Perhaps these two transforms run in the wrong order.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 17, 2019

@mikedn thanks! will try to play with the order. Btw, F# doesn't generate cgt: https://sharplab.io/#v2:DYLgZgzgNALiCWwA+B7ADgUwHYAIDKAnhDBgLYCwAUMBjDicQIw4EYCGATjgLwvtcBSHABYcAHgB8OAAxA== (ceq, ceq) and I, honestly, expected the same from Roslyn 🙂

@mikedn
Copy link

mikedn commented Jul 17, 2019

Btw, F# doesn't generate cgt:

Basically what C# does is an optimization. But AFAIR it's mentioned in ECMA so it's basically an IL idiom so it's perhaps surprising that F# does not use it.

@BruceForstall
Copy link
Member

@dotnet/jit-contrib

@danmoseley
Copy link
Member

danmoseley commented Jul 19, 2019

@EgorBo do you expect this to significantly close this regression (which of course was due to a different cause)? https://github.com/dotnet/coreclr/issues/25728
If so, perhaps it's a 3.0 candidate.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2019

@danmosemsft definitely not enough to cover even ~2% of that regression 🙁 (according to my benchmarks)

@filipnavara
Copy link
Member

@danmosemsft The regression is mostly caused by the different kernel32.dll method being invoked. The P/Invoke itself accounts for bulk of the time difference.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2019

Summary:
(Lower is better)
Total bytes of diff: -319 (-0.01% of base)
    diff is an improvement.
Top file improvements by size (bytes):
        -319 : System.Private.CoreLib.dasm (-0.01% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method regressions by size (bytes):
           3 ( 2.78% of base) : System.Private.CoreLib.dasm - GCHandle:AddrOfPinnedObject():long:this
           1 ( 0.70% of base) : System.Private.CoreLib.dasm - GCHandle:set_Target(ref):this
           1 ( 6.67% of base) : System.Private.CoreLib.dasm - GCHandle:IsPinned(long):bool
Top method improvements by size (bytes):
         -22 (-8.15% of base) : System.Private.CoreLib.dasm - MemoryExtensions:Overlaps(struct,struct,byref):bool (2 methods)
         -18 (-2.53% of base) : System.Private.CoreLib.dasm - GregorianCalendar:AddMonths(struct,int):struct:this
         -17 (-4.35% of base) : System.Private.CoreLib.dasm - GregorianCalendar:GetDaysInYear(int,int):int:this
         -16 (-2.96% of base) : System.Private.CoreLib.dasm - Enum:TryParseRareEnum(ref,ref,struct,bool,bool,byref):bool
         -14 (-17.28% of base) : System.Private.CoreLib.dasm - DateTime:IsLeapYear(int):bool
Top method regressions by size (percentage):
           1 ( 6.67% of base) : System.Private.CoreLib.dasm - GCHandle:IsPinned(long):bool
           3 ( 2.78% of base) : System.Private.CoreLib.dasm - GCHandle:AddrOfPinnedObject():long:this
           1 ( 0.70% of base) : System.Private.CoreLib.dasm - GCHandle:set_Target(ref):this
Top method improvements by size (percentage):
         -13 (-29.55% of base) : System.Private.CoreLib.dasm - JulianCalendar:IsLeapYear(int,int):bool:this
         -14 (-17.28% of base) : System.Private.CoreLib.dasm - DateTime:IsLeapYear(int):bool
         -14 (-14.74% of base) : System.Private.CoreLib.dasm - EastAsianLunisolarCalendar:GregorianIsLeapYear(int):bool
         -10 (-10.99% of base) : System.Private.CoreLib.dasm - Random:GetSampleForLargeRange():double:this
         -13 (-10.92% of base) : System.Private.CoreLib.dasm - JulianCalendar:GetDaysInMonth(int,int,int):int:this
27 total methods with size differences (24 improved, 3 regressed), 18902 unchanged.
Completed analysis in 2.68s

@BruceForstall
Copy link
Member

@EgorBo Why are there any regressions?

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2019

@BruceForstall trying to figure out, this the first time I use jittools so probably messed up a bit.
the problematic method is

bool IsPinned(long handle) => (handle & 1) != 0; // should be a single `and` instruction
; Method ConsoleApp172.Program:IsPinned(long):bool:this

G_M43362_IG01:

G_M43362_IG02:
-      test     dl, 1
+      mov      eax, edx
+      test     al, 1
       setne    al
       movzx    rax, al

G_M43362_IG03:
       ret      
-; Total bytes of code: 10
+; Total bytes of code: 11

I guess it's what my goto did.

@mikedn
Copy link

mikedn commented Jul 19, 2019

It appears that the goto CMP_EQ_OP you have added picks up more cases of GT_GT unsigned and something doesn't work quite right CQ wise:

               [000162] N-C------U--                 |  \--*  GT        int   
               [000159] --C---------                 |     +--*  AND       long  
               [000166] ------------                 |     |  +--*  LCL_VAR   long   V06 tmp3         
               [000158] ------------                 |     |  \--*  CAST      long <- int
               [000157] ------------                 |     |     \--*  CNS_INT   int    1
               [000161] ------------                 |     \--*  CAST      long <- int
               [000160] ------------                 |        \--*  CNS_INT   int    0

The change to NE results in tree narrowing:

               [000162] N----+-N----              \--*  NE        int   
               [000159] -----+------                 +--*  AND       int   
               [000498] ------------                 |  +--*  CAST      int <- long
               [000166] -----+------                 |  |  \--*  LCL_VAR   long   V01 loc0         
               [000158] -----+------                 |  \--*  CNS_INT   int    1
               [000161] -----+------                 \--*  CNS_INT   int    0

That long to int cast should be a no-op but it actually generates mov eax, ecx.

@mikedn
Copy link

mikedn commented Jul 19, 2019

Note that the diff you did only has corelib. The entire FX diff is:

Total bytes of diff: -1765 (-0.01% of base)
    diff is an improvement.
Top file improvements by size (bytes):
        -319 : System.Private.CoreLib.dasm (-0.01% of base)
        -192 : System.Net.WebClient.dasm (-0.43% of base)
        -156 : System.Private.DataContractSerialization.dasm (-0.02% of base)
        -123 : System.Security.AccessControl.dasm (-0.18% of base)
        -120 : System.Net.Sockets.dasm (-0.07% of base)
29 total files with size differences (29 improved, 0 regressed), 100 unchanged.
Top method regressions by size (bytes):
          16 ( 0.64% of base) : System.Private.Uri.dasm - Uri:GetUriPartsFromUserString(int):ref:this
           4 ( 1.30% of base) : System.Private.Uri.dasm - Uri:CreateHostStringHelper(ref,ushort,ushort,byref,byref):ref
           3 ( 2.78% of base) : System.Private.CoreLib.dasm - GCHandle:AddrOfPinnedObject():long:this
           3 ( 0.58% of base) : System.Private.Uri.dasm - Uri:.ctor(ref,ref):this (2 methods)
           1 ( 0.70% of base) : System.Private.CoreLib.dasm - GCHandle:set_Target(ref):this
Top method improvements by size (bytes):
         -48 (-1.52% of base) : System.Private.DataContractSerialization.dasm - Base64Encoding:GetBytes(ref,int,int,ref,int):int:this (2 methods)
         -36 (-0.28% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
         -32 (-0.77% of base) : System.Private.Uri.dasm - Uri:ReCreateParts(int,ushort,int):ref:this
         -32 (-1.48% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:.ctor(int,int,int,int):this
         -31 (-3.98% of base) : Microsoft.CodeAnalysis.dasm - CommandLineParser:SplitCommandLineIntoArguments(ref,bool,byref):ref
Top method regressions by size (percentage):
           1 ( 6.67% of base) : System.Private.CoreLib.dasm - GCHandle:IsPinned(long):bool
           1 ( 5.00% of base) : System.Private.Uri.dasm - Uri:get_IsImplicitFile():bool:this
           1 ( 5.00% of base) : System.Private.Uri.dasm - Uri:get_UserDrivenParsing():bool:this
           1 ( 5.00% of base) : System.Private.Uri.dasm - Uri:get_IsDosPath():bool:this
           1 ( 5.00% of base) : System.Private.Uri.dasm - Uri:get_IsUncOrDosPath():bool:this
Top method improvements by size (percentage):
         -13 (-29.55% of base) : System.Private.CoreLib.dasm - JulianCalendar:IsLeapYear(int,int):bool:this
         -16 (-16.84% of base) : System.Security.Cryptography.Primitives.dasm - SymmetricAlgorithm:set_FeedbackSize(int):this
         -14 (-14.74% of base) : System.Data.Common.dasm - SqlDateTime:IsLeapYear(int):bool
         -14 (-14.74% of base) : System.Private.CoreLib.dasm - EastAsianLunisolarCalendar:GregorianIsLeapYear(int):bool
         -25 (-14.45% of base) : System.Reflection.Metadata.dasm - MethodBodyStreamEncoder:.ctor(ref):this
170 total methods with size differences (157 improved, 13 regressed), 146579 unchanged.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 9, 2019

Workarounded the regression (not sure how to properly fix it):

Summary:
(Lower is better)
Total bytes of diff: -1027 (-0.00% of base)
    diff is an improvement.
Top file improvements by size (bytes):
        -300 : System.Private.CoreLib.dasm (-0.01% of base)
        -156 : System.Private.DataContractSerialization.dasm (-0.02% of base)
        -123 : System.Security.AccessControl.dasm (-0.18% of base)
         -70 : System.Reflection.Metadata.dasm (-0.02% of base)
         -66 : Microsoft.CodeAnalysis.dasm (-0.01% of base)
20 total files with size differences (20 improved, 0 regressed), 109 unchanged.
Top method improvements by size (bytes):
         -48 (-1.52% of base) : System.Private.DataContractSerialization.dasm - Base64Encoding:GetBytes(ref,int,int,ref,int):int:this (2 methods)
         -32 (-1.48% of base) : System.Reflection.Metadata.dasm - MetadataBuilder:.ctor(int,int,int,int):this
         -31 (-3.98% of base) : Microsoft.CodeAnalysis.dasm - CommandLineParser:SplitCommandLineIntoArguments(ref,bool,byref):ref
         -29 (-3.73% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CtfInteger:Read(ref,int):ref:this
         -26 (-1.15% of base) : System.Collections.dasm - SortedSet`1:ConstructRootFromSortedArray(ref,int,int,ref):ref (2 methods)
Top method improvements by size (percentage):
         -13 (-29.55% of base) : System.Private.CoreLib.dasm - JulianCalendar:IsLeapYear(int,int):bool:this
         -16 (-16.84% of base) : System.Security.Cryptography.Primitives.dasm - SymmetricAlgorithm:set_FeedbackSize(int):this
         -14 (-14.74% of base) : System.Data.Common.dasm - SqlDateTime:IsLeapYear(int):bool
         -14 (-14.74% of base) : System.Private.CoreLib.dasm - EastAsianLunisolarCalendar:GregorianIsLeapYear(int):bool
         -25 (-14.45% of base) : System.Reflection.Metadata.dasm - MethodBodyStreamEncoder:.ctor(ref):this
66 total methods with size differences (66 improved, 0 regressed), 148723 unchanged.
Completed analysis in 12.51s

0 regressions

@@ -12745,6 +12766,11 @@ GenTree* Compiler::fgMorphSmpOp(GenTree* tree, MorphAddrContext* mac)
oper = (oper == GT_LE) ? GT_EQ : GT_NE;
tree->SetOper(oper, GenTree::PRESERVE_VN);
tree->gtFlags &= ~GTF_UNSIGNED;

if (op1->OperIs(GT_MOD))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo I was trying to review that PR and lost the track of the changes, the first part Transform X MOD C == 0 to X UMOD C == 0 looks clear, but why were this label and goto added?

and it is really scary for me to have a back edge in a function like fgMorphSmpOp especially for 500 lines length.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandreenko I wanted to handle X % C != 0 but couldn't just visit GT_NE node - Roslyn emits cgt.un (GT_GT) and morph converts it back to GT_NE but doesn't visit GT_NE after. So I have three options to fix the != 0 case:

  1. Handle unsigned GT_GT node instead of GT_NE
  2. Convert GT_GT to GT_NE early in the importer
  3. goto GT_NE (current)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be better to call fgMorphSmpOp recursively here?

Copy link
Member Author

@EgorBo EgorBo Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandreenko will try, but I think there will be a few regressions

@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

  1. In your coreclr repository clone, create patch by running git format-patch origin
  2. In your runtime repository clone, apply the patch by running git apply --directory src/coreclr <path to the patch created in step 1>

@BruceForstall BruceForstall added optimization post-consolidation PRs which will be hand ported to dotnet/runtime labels Nov 7, 2019
@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in #27549 the dotnet/runtime repository will be used going forward for changes to this code base. Closing this PR as no more changes will be accepted into master for this repository. If you’d like to continue working on this change please move it to dotnet/runtime.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen optimization post-consolidation PRs which will be hand ported to dotnet/runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants